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1 Hardware and software support for efficient exception handling 
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Program-synchronous exceptions, for example, breakpoints, watchpoints, illegal opcodes, 
and memory access violations, provide information about exceptional conditions, 
interrupting the program and vectoring to an operating system handler. Over the last 
decade, however, programs and run-time systems have increasingly employed these 
mechanisms as a performance optimization to detect normal and expected conditions. 
Unfortunately, current archi ... 

2 Decoupjed.hardware 

Steven K. Reinhardt, Robert W. Pfile, David A. Wood 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, volume 24 issue 2 
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This paper investigates hardware support for fine-grain distributed shared memory (DSM) in 
networks of workstations. To reduce design time and implementation cost relative to 
dedicated DSM systems, we decouple the functional hardware components of DSM support, 
allowing greater use of off-the-shelf devices. We present two decoupled systems, Typhoon-0 
and Typhoon-1. Typhoon-0 uses an off-the-shelf protocol processor and network interface; a 
custom access control device is the only DSM-specific hard ... 



An analysis of operating system behavior on a simultaneous multithreaded architecture 
Joshua A. Redstone, Susan J. Eggers, Henry M. Levy 

November 2000 Proceedings of the ninth international conference on Architectural 

support for programming languages and operating systems, volume 28 , 
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This paper presents the first analysis of operating system execution on a simultaneous 
multithreaded (SMT) processor. While SMT has been studied extensively over the past 6 
years, previous research has focused entirely on user-mode execution. However, many of 
the applications most amenable to multithreading technologies spend a significant fraction of 
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their time in kernel code. A full understanding of the behavior of such workloads therefore 
requires execution and measurement of the operating sy ... 

4 An analysis.^ H 
Joshua A. Redstone, Susan J. Eggers, Henry M. Levy 
November 2000 ACM SIGPLAN Notices, volume 35 issue 11 
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This paper presents the first analysis of operating system execution on a simultaneous 
multithreaded (SMT) processor. While SMT has been studied extensively over the past 6 
years, previous research has focused entirely on user-mode execution. However, many of 
the applications most amenable to multithreading technologies spend a significant fraction of 
their time in kernel code. A full understanding of the behavior of such workloads therefore 
requires execution and measurement of the operating sy ... 

Software-mntro!^ [ 
D. R. Cheriton, G. A. Slavenburg, P. D. Boyle 
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annual international symposium on Computer architecture, volume 14 issue 2 
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VMP is an experimental multiprocessor that follows the familiar basic design of multiple 
processors, each with a cache, connected by a shared bus to global memory. Each processor 
has a synchronous, virtually addressed, single master connection to its cache, providing 
very high memory bandwidth. An unusually large cache page size and fast sequential 
memory copy hardware make it feasible for cache misses to be handled in software, 
analogously to the handling of virtual memory page faults. Har ... 

6 A.VLIW.archited H 
Robert P. Colwell, Robert P. Nix, John J. O'Donnell, David B. Papworth, Paul K. Rodman 

October 1987 Proceedings of the second international conference on Architectual 

support for programming languages and operating systems, volume is , 22 , 

21 Issue 5 , 10 , 4 

Full text available: ^fidf(.1.59 MS.l Additional Information: full .citation, abstract, citings, index.terrns 

Very Long Instruction Word (VLIW) architectures were promised to deliver far more than the 
factor of two or three that current architectures achieve from overlapped execution. Using a 
new type of compiler which compacts ordinary sequential code into long instruction words, a 
VLIW machine was expected to provide from ten to thirty times the performance of a more 
conventional machine built of the same implementation technology. Multiflow Computer, 
Inc., has now built a VLIW called the TRACE TM< - 

7 Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Q 
Design 

S. Yoo, G. Nicolescu, L. Gauthier, A. Jerraya 

March 2002 Proceedings of the conference on Design, automation and test in Europe 

Full text available: Wl pdf(158.87 KB) 
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To enable fast and accurate evaluation of HW/SW implementationchoices of on-chip 
communication, we presenta method to automatically generate timed OS simulationmodels. 
The method generates the OS simulation modelswith the simulation environment as a 
virtual processor.Since the generated OS simulation models use finalOS code, the presented 
method can mitigate the OS codeequivalence problem. The generated model also 
simulatesdifferent types of processor exceptions. This approach providestwo orders ... 

8 QMP: a RiSC -based multiprocessor using orthogonal-access memories and multiple Q 
sfiann|ng.buses 

K. Hwang, M. Dubois, D. K. Panda, S. Rao, S. Shang, A. Uresin, W. Mao, H. Nair, M. Lytwyn, F. 
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This paper presents the architectural design and RISC based implementation of a prototype 
supercomputer, namely the Orthogonal Multiprocessor (OMP). The OMP system is 
constructed with 16 Intel 1860 RISC microprocessors and 256 parallel memory modules, 
which are 2-D interleaved and orthogonally accessed using custom-designed spanning 
buses. The architectural design has been validated by a CSIM-based multiprocessor 
simulator. The design choices are based on worst-case delay a ... 

9 Making operating systems more robust: improving the reliability of commodity 
operating..systems 

Michael M. Swift, Brian N. Bershad, Henry M. Levy 

October 2003 Proceedings of the nineteenth ACM symposium on Operating systems 
principles 
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Despite decades of research in extensible operating system technology, extensions such as 
device drivers remain a significant cause of system failures. In Windows XP, for example, 
drivers account for 85% of recently reported failures. This paper describes Nooks, a 
reliability subsystem that seeks to greatly enhance OS reliability by isolating the OS from 
driver failures. The Nooks approach is practical: rather than guaranteeing complete fault 
tolerance through a new (and incompatible) OS ... 

Keywords: I/O, device drivers, protection, recovery, virtual memory 
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We present a design flow for the generation of application-specific multiprocessor 
architectures. In the flow, architectural parameters are first extracted from a high-level 
system specification. Parameters are used to instantiate architectural components, such as 
processors, memory modules and communication networks. The flow includes the automatic 
generation of communication coprocessor that adapts the processor to the communication 
network in an application-specific way. Experiments with ... 

11 The interaction of architecture and operating system design 
Thomas E. Anderson, Henry M. Levy, Brian N. Bershad, Edward D. Lazowska 

April 1991 Proceedings of the fourth international conference on Architectural support 
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Despite the fact that large-scale shared-memory multiprocessors have been commercially 
available for several years, system software that fully utilizes all their features is still not 
available, mostly due to the complexity and cost of making the required changes to the 
operating system. A recently proposed approach, called Disco, substantially reduces this 
development cost by using a virtual machine monitor that laverages the existing operating 
system technology. In this paper we present a ... 

Keywords: fault containment, resource managment, scalable multiprocessors, virtual 
machines 
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Despite the fact that large-scale shared-memory multiprocessors have been commercially 
available for several years, system software that fully utilizes all their features is still not 
available, mostly due to the complexity and cost of making the required changes to the 
operating system. A recently proposed approach, called Disco, substantially reduces this 
development cost by using a virtual machine monitor that leverages the existing operating 
system technology. In this paper we present a syste ... 

14 The Clipper processor: instruction set architecture and implementation 
W. Hollingsworth, H. Sachs, A. J. Smith 

February 1989 Communications of the ACM, volume 32 issue 2 
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Intergraph's CLIPPER microprocessor is a high performance, three chip module that 
implements a new instruction set architecture designed for convenient programmability, 
broad functionality, and easy future expansion. 

15 Poster session 2: Cycle-accurate power analysis for multiprocessor systems-on-a-chip H 
Mirko Loghi, Massimo Poncino, Luca Benini 

April 2004 Proceedins of the 14th ACM Great Lakes symposium on VLSI 
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Developing energy-aware software for multiprocessor systems-on-chip (MPSoCs) is a 
difficult task, which requires the knowledge of the distribution of the power consumption 
among several heterogeneous devices (cores, memories, busses, etc.). In this work we 
analyze the power breakdowns of power consumption for a complete MPSoC platform, under 
several application workloads and operating conditions. We leverage a complete-system 
simulation platform with accurate power models for all key hardware mo ... 

Keywords: low power, multiprocessor, system-on-chip 
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The advantages of using message passing over shared memory for certain types of 
communication and synchronization have provided an incentive to integrate both models 
within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared 
memory) project at Stanford is to achieve this integration while maintaining a simple and 
efficient design. This paper presents the hardware and software mechanisms in FLASH to 
support various message passing protocols. We achieve low overhe ... 
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This paper describes Embra, a simulator for the processors, caches, and memory systems of 
uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS 
simulation environment, Embra models the processors of a MIPS R3000/R4000 machine 
faithfully enough to run a commercial operating system and arbitrary user applications. To 
achieve high simulation speed, Embra uses dynamic binary translation to generate code 
sequences which simulate the workload. It is the first machine simu ... 
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Current shared-memory multiprocessors are inherently vulnerable to faults: any significant 
hardware or system software fault causes the entire system to fail. Unless provisions are 
made to limit the impact of faults, users will perceive a decrease in reliability when they 
entrust their applications to larger machines. This paper shows that fault containment 
techniques can be effectively applied to scalable shared-memory multiprocessors to reduce 
the reliability problems created by increased mach ... 
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Richard L. Sites 

February 1993 Communications of the ACM, volume 36 issue 2 
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Multiprocessors have long been of interest to computer community. They provide the 
potential for accelerating applications through parallelism and increased throughput for large 
multi-user system. Three factors have limited the commercial success of multiprocessor 
systems; entry cost, range of performance, and ease of application. Advances in very large 
scale integration (VLSI) and in computer aided design (CAD) have removed these 
limitations, making possible a new class of multiprocessor system ... 
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